Parallel Corpora for the Galician Language: Building and Processing of the CLUVI (Linguistic Corpus of the University of Vigo)
نویسندگان
چکیده
In this paper, we present the methodology developed by the SLI (Computational Linguistics Group of the University of Vigo) for the building and processing of the CLUVI Corpus, showing the TMX-based XML specification designed to encode both morphosyntactic features and translation alignments in parallel corpora, and the solutions adopted for making the CLUVI parallel corpora freely available over the WWW (http://sli.uvigo.es/CLUVI/).
منابع مشابه
Etiquetario morfosintáctico del SLI para corpus de lengua gallega: aplicación al corpus paralelo TECTRA
In this article we present a complete and normalized morphosyntactic tagset for the annotation of linguistic corpora in Galician. The elaboration of this tagset, designed by the Computational Linguistics Group (SLI) of the University of Vigo, following strictly the EAGLES recommendations (Leech and Wilson, 1996), includes the creation of an intermediate tagset that allows us to establish a corr...
متن کاملUsing a Multimedia Parallel Corpus to Investigate English-Galician Subtitling
This paper presents an ongoing research project that involves the compilation and exploitation of the multimedia corpus of subtitled films Veiga as a method to investigate the practice of English intralingual subtitling and English-Galician interlingual subtitling. Our project draws on recent work in corpus-based translation studies and its applications in the field of audiovisual translation a...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملVocabulary Lists for EAP and Conversation Students
Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...
متن کاملGenre Analysis of ELT and Nursing Academic Written Discourse through Introduction
Since Swales’ (1981, 1990) CARS model work on the move structure of research articles, studies on genre analysis have been carried out amongst which works on different parts of research articles in various disciplines has gained a considerable literature. This study aims to investigate the rhetorical structure of the Introduction sections of articles in two fields of English Language Teaching (...
متن کامل